Utterance Topic Model for Generating Coherent Summaries
نویسندگان
چکیده
Generating short multi-document summaries has received a lot of focus recently and is useful in many respects including summarizing answers to a question in an online scenario like Yahoo! Answers. The focus of this paper is to attempt to define a new probabilistic topic model that includes the semantic roles of the words in the document generation process. Words always carry syntactic and semantic information and often such information, for e.g., the grammatical and semantic role (henceforth GSR) of a word like Subject, Verb, Object, Adjective qualifiers, WordNet and VerbNet role assignments etc. is carried across adjacent sentences to enhance local coherence in different parts of a document. A statistical topic model like LDA[5] usually models topics as distributions over the word count vocabulary only. We posit that a document could first be topic modeled over a vocabulary of GSR transitions and then corresponding to each transition, words and and hence sentences can be sampled to best describe the transition. Thus the topics in the proposed model also lend themselves to be distributions over the GSR transitions implicitly. We also later show how this basic model can be extended to a model for query focused summarization where for a particular query, sentences can be ranked by a product of thematical salience and coherence through GSR transitions. We empirically show that the new topic model had lower test set perplexity than LDA and we also analyze the performance of our summarization model using the ROUGE[13] on DUC2005 dataset and PYRAMID[17] on the TAC2008 and TAC2009 datasets.
منابع مشابه
Surveyor: A System for Generating Coherent Survey Articles for Scientific Topics
We investigate the task of generating coherent survey articles for scientific topics. We introduce an extractive summarization algorithm that combines a content model with a discourse model to generate coherent and readable summaries of scientific topics using text from scientific articles relevant to the topic. Human evaluation on 15 topics in computational linguistics shows that our system pr...
متن کاملGenerating Coherent Extracts of Single Documents Using Latent Semantic Analysis
Generating Coherent Extracts of Single Documents Using Latent Semantic Analysis Tristan Miller Master of Science Graduate Department of Computer Science University of Toronto 2003 A major problem with automatically-produced summaries in general, and extracts in particular, is that the output text often lacks textual coherence. Our goal is to improve the textual coherence of automatically produc...
متن کاملGenerating Coherent Summaries with Textual Aspects
Initiated by TAC 2010, aspect guided summaries not only address specific user need, but also ameliorate content level coherence by using aspect information. This paper presents a full fledged system composed of three modules: finding sentence level textual aspects, modeling aspect based co herence with an HMM model, and selecting and ordering sentences with aspect information to generate cohere...
متن کاملGenerating Coherent Summaries of Scientific Articles Using Coherence Patterns
Previous work on automatic summarization does not thoroughly consider coherence while generating the summary. We introduce a graph-based approach to summarize scientific articles. We employ coherence patterns to ensure that the generated summaries are coherent. The novelty of our model is twofold: we mine coherence patterns in a corpus of abstracts, and we propose a method to combine coherence,...
متن کاملNLP Driven Models for Automatically Generating Survey Articles for Scientific Topics
This thesis presents new methods that use natural language processing (NLP) driven models for summarizing research in scientific fields. Given a topic query in the form of a text string, we present methods for finding research articles relevant to the topic as well as summarization algorithms that use lexical and discourse information present in the text of these articles to generate coherent a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009